Data Analytics for Finance
BM17FI Β· Rotterdam School of Management
All commands used in Assignments 1β6
This page lists every Stata command used in the course assignments, grouped by category. Each entry includes a short description, a minimal example, and a link to the official documentation. Examples are illustrative β adapt variable names and options to your own data.
Environment & Setup
Commands for preparing your Stata session before working with data.
| Command | Description | Example | Docs |
|---|---|---|---|
clear |
Remove data (and optionally all objects) from memory | clear all |
π |
set more off |
Disable output pagination so results scroll continuously | set more off |
π |
set scheme |
Set the default appearance for all subsequent graphs | set scheme stcolor |
π |
set linesize |
Set the width (in characters) of the output window | set linesize 120 |
π |
pwd |
Print the current working directory | pwd |
π |
global |
Define a global macro, accessible anywhere in your session | global datadir "$base/data" |
π |
local |
Define a local macro, accessible only within the current context | local cutoff = td(1jan2020) |
π |
scalar |
Store a single numeric or string value | scalar threshold = 0.05 |
π |
display |
Print text, macro values, or expressions to the console | display "Obs: " _N |
π |
ssc install |
Install a user-written package from the SSC archive | ssc install estout, replace |
π |
Example
* Typical session setup
clear all
set more off
set scheme stcolor
global base "`c(pwd)'"
global raw "$base/data/raw"
display "Working directory: $base"Data Import & Export
Commands for reading data into Stata and saving results to disk.
| Command | Description | Example | Docs |
|---|---|---|---|
import delimited |
Import a CSV (or other delimited) file | import delimited "$raw/sales.csv", clear |
π |
use |
Load a Stata .dta file |
use "$processed/panel.dta", clear |
π |
save |
Save the current dataset as a .dta file |
save "$processed/clean.dta", replace |
π |
erase |
Delete a file from disk | erase "$processed/temp.dta" |
π |
Example
* Import a CSV file and save as Stata format
import delimited "$raw/monthly_sales.csv", clear varnames(1)
save "$processed/monthly_sales.dta", replaceData Exploration
Commands for inspecting and understanding your dataset.
| Command | Description | Example | Docs |
|---|---|---|---|
describe |
Show variable names, types, labels, and storage info | describe |
π |
list |
Display selected observations in a table | list id date price in 1/5 |
π |
count |
Count observations, optionally with a condition | count if missing(revenue) |
π |
summarize |
Compute summary statistics (mean, sd, min, max, etc.) | summarize price, detail |
π |
tabulate |
Produce a one-way or two-way frequency table | tabulate industry year |
π |
tabstat |
Display compact summary statistics, optionally by group | tabstat revenue profit, by(region) stat(n mean sd) |
π |
table |
Flexible table of statistics | table region year, stat(mean sales) nformat(%9.2f) |
π |
correlate |
Display a correlation matrix | correlate x1 x2 x3 |
π |
misstable summarize |
Summarize patterns of missing data | misstable summarize revenue profit |
π |
duplicates report |
Report duplicate observations | duplicates report id date |
π |
Example
* Quick data audit
describe
summarize price volume, detail
tabstat price volume, by(industry) stat(n mean sd min max)
count if missing(price)
misstable summarizeData Management β Variables
Commands for creating, modifying, and labeling variables.
| Command | Description | Example | Docs |
|---|---|---|---|
generate |
Create a new variable | gen log_sales = ln(sales) |
π |
replace |
Modify values of an existing variable | replace status = 1 if year >= 2020 |
π |
drop |
Remove variables or observations | drop temp_var |
π |
keep |
Keep only specified variables or observations | keep if year >= 2015 |
π |
rename |
Rename a variable | rename total_assets ta |
π |
order |
Reorder variables in the dataset | order id date price volume |
π |
encode |
Convert a string variable to a labeled numeric variable | encode country, gen(country_num) |
π |
format |
Set the display format of a variable | format date %td |
π |
label variable |
Attach a descriptive label to a variable | label variable log_sales "Log of sales" |
π |
label define |
Define a named set of value labels | label define yesno 0 "No" 1 "Yes" |
π |
label values |
Assign a value-label set to a variable | label values treated yesno |
π |
xtile |
Create quantile-based categories (terciles, quartiles, etc.) | xtile size_q = total_assets, nq(3) |
π |
Example
* Create and label a binary treatment indicator
gen post = (date >= td(1jan2020))
label variable post "Post-treatment indicator"
label define post_lbl 0 "Pre" 1 "Post"
label values post post_lbl
tabulate postData Management β Observations & Datasets
Commands for sorting, merging, reshaping, and aggregating data.
| Command | Description | Example | Docs |
|---|---|---|---|
sort |
Sort observations by one or more variables | sort firm_id date |
π |
merge |
Merge the current dataset with another on key variable(s) | merge m:1 firm_id using "firms.dta" |
π |
collapse |
Aggregate data to a summary level | collapse (mean) avg_ret=ret, by(industry year) |
π |
reshape |
Reshape data between wide and long formats | reshape wide sales, i(firm_id date) j(product) |
π |
preserve |
Save a snapshot of the current data in memory | preserve |
π |
restore |
Restore data from a previous preserve |
restore |
π |
Example: Merging datasets
* Merge daily stock data with firm characteristics
use "daily_prices.dta", clear
merge m:1 firm_id using "firm_characteristics.dta"
tabulate _merge
keep if _merge == 3
drop _mergeExample: Collapse and reshape
* Compute average price by firm and year, then reshape to wide
preserve
collapse (mean) avg_price=price, by(firm_id year)
reshape wide avg_price, i(firm_id) j(year)
restoreBy-group Operations
Commands that operate separately within groups defined by one or more variables.
| Command | Description | Example | Docs |
|---|---|---|---|
bysort / by |
Execute a command separately for each group | bysort firm_id (date): gen cumret = sum(ret) |
π |
egen |
Extended generate β group-aware functions | by firm_id: egen avg_ret = mean(ret) |
π |
Example
* Calculate running sum and group mean within each firm
sort firm_id date
bysort firm_id (date): gen cum_return = sum(daily_ret)
by firm_id: egen firm_avg_ret = mean(daily_ret)
by firm_id: egen firm_sd_ret = sd(daily_ret)egen functions
Common egen functions: mean(), sd(), max(), min(), total(), count(), rank(). These calculate statistics within the group defined by by.
Panel Data Setup
Commands for declaring and inspecting panel (longitudinal) data structures.
| Command | Description | Example | Docs |
|---|---|---|---|
xtset |
Declare the panel variable and time variable | xtset firm_id date |
π |
xtdescribe |
Describe the panel structure (balance, gaps, span) | xtdescribe |
π |
L. |
Time-series lag operator (requires xtset or tsset) |
gen ret = ln(price / L.price) |
π |
Example
* Declare panel and compute log returns
xtset firm_id date
xtdescribe
gen log_return = ln(price / L.price)
label variable log_return "Daily log return"After xtset, you can use: L. (lag), L2. (two-period lag), F. (lead), D. (first difference). These operate within each panel unit automatically.
Statistical Tests
Commands for hypothesis testing and distributional checks.
| Command | Description | Example | Docs |
|---|---|---|---|
ttest |
One- or two-sample t-test | ttest score, by(treatment) |
π |
swilk |
ShapiroβWilk test for normality | swilk residuals |
π |
estat hettest |
BreuschβPagan test for heteroskedasticity (after regress) |
estat hettest |
π |
Example
* Compare mean exam scores between two groups
ttest exam_score, by(study_group)
* After a regression, check residual normality
regress y x1 x2
predict resid, residuals
swilk residRegression Analysis
Core commands for estimating linear models.
| Command | Description | Example | Docs |
|---|---|---|---|
regress |
OLS linear regression | regress y x1 x2 x3, robust |
π |
xtreg |
Panel data regression (fixed or random effects) | xtreg y x1 x2, fe vce(cluster firm_id) |
π |
predict |
Generate predicted values or residuals after estimation | predict yhat, xb |
π |
Example: OLS with robust standard errors
regress wage education experience age, robust
predict wage_hat
predict resid, residualsExample: Panel fixed-effects regression
xtset firm_id year
xtreg revenue marketing_spend rd_spend, fe vce(cluster firm_id)Storing & Comparing Estimates
Commands for saving regression results and displaying them side by side.
| Command | Description | Example | Docs |
|---|---|---|---|
estimates store |
Store the current estimation result under a name | estimates store model1 |
π |
estimates restore |
Restore a previously stored estimation | estimates restore model1 |
π |
estimates table |
Display stored estimates side by side | estimates table model1 model2, star |
π |
estimates dir |
List all stored estimation results | estimates dir |
π |
Example
* Run two specifications and compare
regress y x1 x2, robust
estimates store ols_base
regress y x1 x2 x3 x4, robust
estimates store ols_full
estimates table ols_base ols_full, star stats(N r2_a)Formatted Regression Tables
Commands from the estout package for producing publication-ready tables.
| Command | Description | Example | Docs |
|---|---|---|---|
eststo |
Store an estimation result (shorthand) | eststo m1: regress y x1, robust |
π |
esttab |
Export a formatted regression table (screen, LaTeX, CSV, β¦) | esttab m1 m2 using "table.tex", se label replace |
π |
estpost |
Post results from non-estimation commands for use with esttab |
estpost summarize y x1 x2 |
π |
Install with ssc install estout, replace. Full documentation: estout homepage.
Example
* Build a regression table with three models
eststo clear
eststo m1: regress y x1, robust
eststo m2: regress y x1 x2, robust
eststo m3: regress y x1 x2 x3, robust
esttab m1 m2 m3, ///
se star(* 0.10 ** 0.05 *** 0.01) ///
label r2 N ///
title("Regression Results")Example: Summary statistics table
estpost summarize price volume market_cap
esttab, cells("mean(fmt(2)) sd(fmt(2)) min max count") nomtitle nonumberGraphics β Core Plot Types
| Command | Description | Example | Docs |
|---|---|---|---|
twoway line |
Line plot (typically for time series) | twoway line price date, title("Price Over Time") |
π |
twoway scatter |
Scatter plot | scatter y x, mlabel(name) |
π |
twoway lfit |
Overlay a linear-fit line | twoway (scatter y x) (lfit y x) |
π |
twoway function |
Plot an arbitrary function | twoway function y=0, range(x) lcolor(red) |
π |
twoway rcap |
Range plot with capped spikes (confidence intervals) | twoway rcap ci_hi ci_lo x |
π |
histogram |
Histogram with optional normal-density overlay | histogram ret, normal bin(40) |
π |
graph export |
Save the current graph to a file (PNG, PDF, SVG, β¦) | graph export "fig.png", replace width(1200) |
π |
Example: Time series with event marker
twoway line price date if firm_id == 42, ///
title("Daily Stock Price") ///
xtitle("Date") ytitle("Price (EUR)") ///
xline(`event_date', lpattern(dash) lcolor(red))
graph export "$figures/price_plot.png", replace width(1400)Example: Multi-series comparison
twoway ///
(line price date if industry == "Tech", lcolor(navy)) ///
(line price date if industry == "Banks", lcolor(maroon)), ///
legend(label(1 "Technology") label(2 "Banking") ///
position(6) cols(2)) ///
xtitle("Date") ytitle("Price (EUR)")Example: Scatter with fitted line
twoway (scatter wage education) ///
(lfit wage education, lcolor(red)), ///
title("Wages vs. Education") ///
xtitle("Years of Education") ytitle("Hourly Wage")Graphics β Common Options
A quick reference for the most-used graph options across the assignments.
| Option | Purpose | Example |
|---|---|---|
title() |
Main graph title | title("Stock Returns") |
subtitle() |
Subtitle below the title | subtitle("2015β2020") |
xtitle() / ytitle() |
Axis labels | xtitle("Date") |
note() |
Footnote below the graph | note("Source: Compustat") |
legend() |
Control the legend | legend(label(1 "Firm A") position(6) cols(2)) |
xline() / yline() |
Add reference lines | xline(21550, lpattern(dash) lcolor(red)) |
lcolor() |
Line colour | lcolor(navy) |
lwidth() |
Line thickness | lwidth(medium) |
lpattern() |
Line pattern (solid, dash, dot, β¦) | lpattern(dash) |
mcolor() / mlabel() |
Marker colour / labels | mcolor(navy) mlabel(name) |
by() |
Create a panel of graphs, one per group | histogram ret, by(firm) |
name(, replace) |
Store the graph in memory under a name | name(g1, replace) |
xlabel() / ylabel() |
Customise axis tick marks | xlabel(, format(%td) angle(45)) |
Use /// at the end of a line to continue the command on the next line. This keeps long graph commands readable.
Programming & Flow Control
| Command | Description | Example | Docs |
|---|---|---|---|
foreach |
Loop over a list of items | foreach v in x1 x2 x3 { summarizevβ }| [π](https://www.stata.com/manuals/pforeach.pdf) | |forvalues| Loop over a numeric range |forvalues i = 1/5 { display i' } |
π |
if / else |
Conditional execution of code blocks | if _rc == 0 { display "OK" } |
π |
capture |
Run a command and suppress any error; stores return code in _rc |
capture confirm file "data.dta" |
π |
quietly |
Run a command but suppress all output | quietly regress y x1 |
π |
assert |
Assert that a condition holds; error if it does not | assert _N > 0 |
π |
confirm |
Confirm that a file or variable exists | confirm file "$raw/data.dta" |
π |
levelsof |
Store the unique values of a variable in a local macro | levelsof region, local(regions) |
π |
return list |
Display saved results from the last r-class command | return list |
π |
Example
* Loop over variables and summarize each
foreach var in revenue profit assets {
display "--- `var' ---"
summarize `var'
}
* Check all expected files exist
foreach f in "q1.dta" "q2.dta" "q3.dta" {
capture confirm file "$raw/`f'"
if _rc != 0 {
display as error "Missing: `f'"
}
}Most Stata commands store results you can reuse:
- r-class (e.g.,
summarize): access withr(mean),r(sd),r(N), etc. - e-class (e.g.,
regress): access withe(N),e(r2),e(cmd), etc. - Coefficients:
_b[varname]and_se[varname]after any estimation command. - System values:
_N(total obs),_n(current obs number),_rc(last return code). - System constants:
c(pwd),c(k)(number of variables),c(N)(number of obs).
Functions Reference
Key functions used inside generate, replace, if, and other expressions.
Math functions
| Function | Description | Example |
|---|---|---|
ln(x) |
Natural logarithm | gen log_assets = ln(total_assets) |
exp(x) |
Exponential (\(e^x\)) | gen level = exp(log_ret) |
abs(x) |
Absolute value | gen abs_ret = abs(ret) |
sum(x) |
Running (cumulative) sum β within bysort: gen |
bysort id (date): gen cumsum = sum(ret) |
Date functions
| Function | Description | Example |
|---|---|---|
td(DDmonYYYY) |
Convert a literal date string to a Stata date number | local d = td(15mar2020) |
date(s, mask) |
Parse a string variable to a Stata date number | gen stata_date = date(date_str, "YMD") |
mdy(m, d, y) |
Create a date from month, day, and year values | gen event = mdy(9, 18, 2015) |
year(d) |
Extract the year from a date | gen yr = year(date) |
month(d) |
Extract the month from a date | gen mo = month(date) |
mofd(d) |
Convert a daily date to a monthly date | gen month_date = mofd(date) |
dofm(m) |
Convert a monthly date to the first day of that month | gen first_day = dofm(month_date) |
String & logical functions
| Function | Description | Example |
|---|---|---|
missing(x) |
Returns 1 if x is missing, 0 otherwise |
count if missing(price) |
inlist(x, a, b, β¦) |
Returns 1 if x equals any listed value |
keep if inlist(country, "DE", "FR", "NL") |
strpos(s, sub) |
Position of substring (0 if not found) | gen has_ag = strpos(name, "AG") > 0 |
After creating a Stata date variable, apply a display format so dates are human-readable: format date %td (daily), format month_date %tm (monthly).
User-Written Packages
These packages are not part of base Stata and must be installed before first use.
| Package | Description | Install | Docs |
|---|---|---|---|
reghdfe |
Linear regression with multiple levels of fixed effects | ssc install reghdfe, replace |
π |
ftools |
Fast Mata routines (required by reghdfe) |
ssc install ftools, replace |
π |
estout |
Suite for formatted tables (esttab, eststo, estpost) |
ssc install estout, replace |
π |
coefplot |
Coefficient plots from stored estimates | ssc install coefplot, replace |
π |
distinct |
Count the number of distinct values of a variable | ssc install distinct, replace |
π |
rangestat |
Calculate statistics over observation ranges / rolling windows | ssc install rangestat, replace |
π |
Example: reghdfe
* TWFE regression absorbing firm and time fixed effects
reghdfe outcome treatment controls, ///
absorb(firm_id year#month) vce(cluster firm_id)Example: rangestat
* Rolling 252-day standard deviation of returns
rangestat (sd) rolling_sd = daily_ret, ///
interval(date -252 -1) by(firm_id)Shell Commands
| Command | Description | Example | Docs |
|---|---|---|---|
! (prefix) |
Execute an operating-system shell command from within Stata | !ls -lh "$figures" |
π |
Data Analytics for Finance
BM17FI Β· Academic Year 2025β26
Created by: Caspar David Peter
Β© 2026 Rotterdam School of Management